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IMAGE PROCESSING DEVICE 

BACKGROUND OF THE INVENTION 
Field of the Invention 

The present invention relates to an image processing 
device adaptable to various encoding methods. 
Description of the Prior Art 

Fig. 9 is a block diagram showing the structure of a prior 
art image processing device disclosed in, for example, "MPEG-4 
LSI, Internet, and Broadcast Services", Journal of the 
Institute of Image Information and Television Engineers , Vol . 53 , 
No. 4, 19 99, for example. In Fig. 9, reference numeral 201 
denotes an instruction memory for storing a program, numeral 
202 denotes a VLE (Variable Length Encode) unit for performing 
a variable-length encoding, numeral 2 03 denotes a VLD (Variable 
Length Decode) unit for performing a variable-length decoding, 
numeral 204 denotes a memory provided by the VLD unit 203, 
numeral 205 denotes a motion compensation unit for performing 
motion compensation processing, numeral 206 denotes a motion 
prediction unit A for performing motion prediction processing, 
numeral 2 07 denotes a motion prediction unit B for performing 
motion prediction processing, numeral 208 denotes a DCT 
(Discrete Cosine Transform) unit for performing DCT processing, 
and numeral 209 denotes an IDCT (Inverse Discrete Cosine 
Transform) unit for performing IDCT processing. 

Furthermore, in Fig. 9, reference numeral 220 denotes an 
external memory for holding the value of a picture signal, 
numerals 230a to 230f denote local memories built in a processor 
211, which will be described below, the motion compensation unit 
205, the motion prediction unit A 206 and the motion prediction 
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unit B 207 , the DCT unit 208, and the IDCT unit 209, respectively, 
and numeral 210 denotes a DMA (Direct Memory Access) control 
unit for controlling those local memories 230a to 23 Of and the 
external memory 220 . The processor 211 can control the VLE unit 
5 202, the VLD unit 203, and the DMA control unit 210. 

In operation, when the prior art image processing device 
performs processing such as the motion compensation processing, 
the motion prediction processing, the DCT processing, or the 
IDCT processing, a specific block actually carries out the 

10 processing • That is, the motion compensation unit 205 carries 
out the motion compensation processing, the motion prediction 
units A and B 206 and 207 carry out the motion prediction 
processing, the DCT unit 208 carries out the DCT processing, 
or the IDCT unit 209 carries out the IDCT processing. 

15 Furthermore, when the prior art image processing device 

performs quantization processing, the processor 211 carries out 
the quantization processing. 

A problem with the prior art image processing device 
constructed as above is that the motion compensation unit 205, 

20 the motion prediction unit A 206, the motion prediction unit 
B 207 , the DCT unit 2 08 , and the IDCT unit 2 09 are blocks specific 
to the algorithm of a given encoding method, and therefore the 
prior art image processing device cannot support various 
encoding methods. Furthermore, another problem is that since 

25 when performing the quantization processing not a block 

specific to the quantization but the processor 211 carries out 
the quantization processing, a number of clock cycles required 
for the quantization processing is increased. 
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SUMMARY OF THE INVENTION 
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The present invention is proposed to solve the above- 
mentioned problems, and it is therefore an object of the present 
invention to provide an image processing device that can support 
various encoding methods and reduce the number of clock cycles 
5 required for the image processing. 

In accordance with an aspect of the present invention, 
there is provided an image processing device comprising: an SIMD 
(Single Instruction stream Multiple Data stream) calculating 
unit for performing operations, such as motion compensation, 

10 motion prediction, DCT processing, IDCT processing, 

quantization, and reverse quantization by means of a pipeline 
operation unit that can be program-controlled by an outside 
unit; a VLC (Variable Length Code) processing unit for 
performing variable-length encoding processing and 

15 variable-length decoding processing according to a given 

encoding method; an external data interface for performing a 
data transfer between the image processing device and an outside 
unit; an instruction memory for holding an instruction to be 
processed; and a processor for decoding the instruction held 

20 by the instruction memory, and for performing a programmed 
control operation on the SIMD calculating unit, the VLC 
processing unit, and the external data interface • The image 
processing device can thus support various encoding methods, 
and can reduce the number of clock cycles required for image 

25 processing* 

In accordance with another aspect of the present 
invention, the image processing device includes a RAM as the 
instruction memory. The image processing device can thus 
support various encoding methods with the single LSI* 

30 In accordance with a further aspect of the present 
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invention, the image processing device includes a ROM as the 
instruction memory. Thus, the area of the LSI can be reduced 
and the cost of the image processing device can be reduced. 
Further objects and advantages of the present invention 
5 will be apparent from the following description of the preferred 
embodiments of the invention as illustrated in the accompanying 
drawings • 

BRIEF DESCRIPTION OF THE DRAWINGS 
10 Fig. 1 is a block diagram showing the structure of an image 

processing device according to a first embodiment of the present 
invention; 

Fig. 2 is a flow chart showing processing performed by 
the image processing device according to the first embodiment 
15 of the present invention; 

Fig. 3 is a block diagram showing the structure of an SIMD 
calculating unit of the image processing device according to 
the first embodiment of the present invention; 

Fig. 4 is a diagram showing the elements of two matrices 
20 which are multiplied with each other, the product of the 

matrices being calculated by the SIMD calculating unit, as shown 
in Fig. 3, of the image processing device according to the first 
embodiment of the present invention; 

Fig. 5 is a diagram showing a pipeline operation of the 
25 SIMD calculating unit, as shown in Fig. 3, of the image 

processing device according to the first embodiment of the 
present invention when performing the multiplication of the two 
matrices as shown in Fig. 4; 

Fig. 6 is a graph showing a comparison between the number 
30 of clock cycles required for only a general-purpose processor 
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to perform image processing on each macro block, and the number 
of clock cycles required for a VLC processing unit to perform 
the image processing on each macro block in cooperation with 
the general-purpose processor; 
5 Fig, 7 is a graph showing a comparison between the number 

of clock cycles required for only a general-purpose processor 
to perform the image processing on each macro block, and the 
number of clock cycles required for the SIMD calculating unit 
to perform the image processing on each macro block in 

10 cooperation with the general-purpose processor; 

Fig, 8 is a graph showing a comparison between the number 
of clock cycles required for only a general-purpose processor 
to perform the image processing on each macro block, and the 
number of clock cycles required for both the VLC processing unit 

15 and the SIMD calculating unit of the image processing device 
according to the first embodiment of the present invention to 
perform the image processing on each macro block in cooperation 
with the general-purpose processor; and 

Fig. 9 is a block diagram showing the structure of a prior 

20 art image processing device. 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 
Embodiment 1 . 

Fig. 1 is a block diagram showing the structure of an image 
25 processing device according to a first embodiment of the present 
invention. In the figure, reference numeral 101 denotes an SIMD 
(Single Instruction stream Multiple Data stream) calculating 
unit for performing operations, such as motion compensation, 
motion predictions, DCT processing, IDCT processing, 
30 quantization, and reverse quantization by means of a pipeline 
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operation device that can be program-controlled by an outside 
unit, numeral 102 denotes a VLC processing unit for performing 
variable-length encoding processing and variable-length 
decoding processing according to a given encoding method, and 
5 numeral 103 denotes an external data interface for performing 
a data transfer between the image processing device and an 
outside unit. 

Furthermore, in Fig. 1, reference numeral 104 denotes an 
instruction memory for holding an instruction to be processed 

10 by the image processing device, and numeral 105 denotes a 
processor for performing a scalar calculating operation, a bit 
handling operation, for executing a comparison instruction and 
a branch instruction, for decoding the instruction held by the 
instruction memory 104, and for controlling the SIMD 

15 calculating unit 101, the VLC processing unit 102, the external 
data interface 103, a video input device 201 which will be 
described below, and a video output device 202 which will be 
described below. The video input device 201 of Fig. 1 can accept 
a video signal from an outside unit, and the video output device 

20 2 02 can deliver a video signal to an outside unit. An external 
memory 2 03 can hold a video signal from either the video input 
device 201 or the external data interface 103. 

In addition, in Fig. 1, reference numeral 151 denotes a 
32-bit video data bus for connecting the external data interface 

25 103 to the video input device 201, the output device 202, and 
the external memory 203 , numerals 152 and 153 denote I/O control 
signals that pass through a line for connecting the processor 
105 to the video input device 201 and a line for connecting the 
processor 105 and the video output device 202, respectively, 

30 for controlling the input/output of a video signal, and numeral 
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154 denotes a 32-bit internal data bus for connecting the SIMD 
calculating unit 101, the VLC processing unit 102, and the 
external data interface 103 with one another. 

Fig. 2 is a flow chart showing the encoding processing 
5 performed by the image processing device according to the first 
embodiment of the present invention. The image processing 
device transmits image data A from the video input device 201 
to the external memory 203 in step STl. The image processing 
device then, in step ST2, transmits necessary pixel data B of 

10 the image data A from the external memory 203 to the external 
data interface 103 according to the processing done by the SIMD 
calculating unit 101. The SIMD calculating unit 101, in step 
ST3, performs motion compensation, DOT processing, and 
quantization so as to obtain conversion coefficient data C. The 

15 VLC processing unit 102, in step ST4, converts the conversion 
coefficient data C to a variable-length code. The VLC 
processing unit 102 then, in step ST5, outputs bit stream data 
D as the result of the processing of step ST4 . 

Next, a description will be made as to the multiplication 

20 of two matrices with 8 rows and 8 columns, as an example of the 
encoding processing which is carried out during the DCT 
processing done by the SIMD calculating unit 101. Fig. 3 is 
a block diagram showing the structure of the SIMD calculating 
unit that consists of 16 memories in parallel and 8 pipeline 

25 calculating units in parallel. In the figure, reference 
numerals 301a-l, 301a-2, 301b-l, 301b-2, 301c-l, 301c-2, 
301d-l, and301d-2 denote 16 memories in parallel, respectively, 
and 311a, 311b, 311c, and 311d denote 8 pipeline 

calculating units in parallel, respectively. The SIMD 

30 calculating unit is divided into 8 units: Unit#0 to Unit#7. 
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Unit#0 consists of the two memories 301a-l and 301a-2 and the 
pipeline calculating unit 311a, and either of Unit#l, 
Unit#2, and Unit#7 consists of two memories and one 

pipeline calculating unit in the same way. 
5 Furthermore, each of the eight pipeline calculating units 

of Fig. 3 includes an adder/subtracter 351 for performing an 
addition operation and a subtraction operation, a multiplier 
352 for performing a multiplication operation, a difference 
calculator 353 for performing a difference operation, an 

10 accumulator 354 for performing an accumulation operation, a 
shifting/rounding unit 355 for performing a shift operation and 
a round operation, a clipping unit 356 for performing a clipping 
operation, and registers 361a to 361g each for holding an 
operation result. 

15 Fig. 4 is a diagram showing the elements of a matrix X 

and the elements of a matrix Y, on which an operation of matrix 
multiplication is performed. Before calculating the sum of the 
products which are obtained by multiplying element-by-element 
each of all the elements in the first row of the matrix X with 

20 a corresponding one of all the elements in the first column of 
the matrix Y, all the elements in the first row of the matrix 
X, i.e., XI, X2, and X8 are held in each of the memories 

301a-l, 301b-l, 301C-1, and 301d-l. The memory 301a-2 

holds all the elements in the first column of the matrix Y, i.e. , 

25 Yl, Y2, and Y8, the memory 301a"-2 holds all the elements 

in the second column of the matrix Y, i.e., Y9, YlO, and 
Y16, and in the same way, the remaining memories 301c-2, 
and 3 01d-2 hold all the elements in the third to eighth columns 
of the matrix Y, respectively. 

30 Unit#0 then calculates the sum of the element-by-element 
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products of each of all the elements in the first row of the 
matrix X and a corresponding one of all the elements in the first 
column of the matrix Y. Unit#l calculates the sum of the 
element-by-element products of each of all the elements in the 
5 first row of the matrix X and a corresponding one of all the 
elements in the second column of the matrix Y. In the same way, 
Unitti {i=2 to 7) calculates the sum of the element -by-element 
products of each of all the elements in the first row of the 
matrix X and a corresponding one of all the elements in the 

10 (i+l)tjf2 column of the matrix Y. 

Fig. 5 is a diagram showing the pipeline operation of 
Unit#0 when the SIMD calculating unit 101 performs the 
multiplication of two 8 by 8 matrices as shown in Fig. 4. In 
the first cycle of the pipeline operation, Unit#0 transfers the 

15 element XI of the matrix X from the memory 301a-l to the pipeline 
operation unit 311a, and also transfers the element Yl of the 
matrix Y from the memory 301a-2 to the pipeline operation unit 
311a. In the second cycle of the pipeline operation, the 
multiplier 3 52 of the pipeline operation unit 311a then performs 

20 the multiplication of XI and Yl, and Unit#0 simultaneously 
transfers the element X2 of the matrix X from the memory 3 01a-l 
to the pipeline operation unit 311a, and also transfers the 
element Y2 of the matrix Y from the memory 3 01a-2 to the pipeline 
operation unit 311a. In the third cycle of the pipeline 

25 operation, the multiplier 3 52 of the pipeline operation unit 
311a then performs the multiplication of X2 and Y2, and Unit#0 
simultaneously transfers the element X3 of the matrix X from 
the memory 301a-l to the pipeline operation unit 311a, and also 
transfers the element Y3 of the matrix Y from the memory 3Gla-2 

30 to the pipeline operation unit 311a. In the fourth cycle of 
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the pipeline operation^ the accumulator 354 of the pipeline 
operation unit 311a calculates the sum of Xl*Yl and X2*Y2. In 
the same cycle, the multiplier 352 of the pipeline operation 
unit 311a performs the multiplication of X3 and Y3, and Unit#0 
5 simultaneously transfers the element X4 of the matrix X from 
the memory 301a-l to the pipeline operation unit 311a, and also 
transfers the element Y4 of the matrix Y from the memory 301a-2 
to the pipeline operation unit 311a. 

In the same way that Unit#0 calculates the sum of the 

10 element-by-element products of each of all the elements in the 
first row of the matrix X and a corresponding one of all the 
elements in the first column of the matrix Y, each of Unit#l 
to Unit#7 performs a similar operation. The SIMD calculating 
unit performs the multiplication of the two 8 by 8 matrices by 

15 repeating the above-mentioned processes by means of Unit#0 to 
Unit#7. 

Next, the number of clock cycles required for image 
processing will be explained* In general, a function of 
supporting various encoding methods is implemented via a 

20 general-purpose processor. Fig. 6 is a graph showing a 

comparison between the number of clock cycles required for only 
a general-purpose processor, such as the processor 105, to 
perfoinn image processing on each macro block, and the number 
of clock cycles required for the VLC processing unit 102 to 

25 perform the image processing on each macro block in cooperation 
with the general-purpose processor. Although the number of 
clock cycles required for the image processing can be reduced 
by using the VLC processing unit 102 as can be seen from Fig. 
6, a lot of clock cycles is needed for the matrix calculating 

30 operation and the reduction is not good enough. 
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Fig- 7 is a graph showing a comparison between the number 
of clock cycles required for only a general-purpose processor 
to perform image processing on each macro block, and the number 
of clock cycles required for the SIMD calculating unit 101 to 
5 perform the image processing on each macro block in cooperation 
with the general-purpose processor. Although the number of 
clock cycles required for the image processing can be reduced 
by using the SIMD calculating unit 101 as can be seen from Fig, 
If a lot of clock cycles is needed for the VLC calculating 

10 operation and the reduction is not good enough. 

Fig. 8 is a graph showing a comparison between the number 
of clock cycles required for only a general-purpose processor 
to perform image processing on each macro block, and the number 
of clock cycles required for the VLC processing unit 102 and 

15 the SIMD calculating unit 101 to perform the image processing 
on each macro block in cooperation with the general-purpose 
processor. The number of clock cycles required for the image 
processing can be reduced sufficiently by using both the VLC 
processing unit 102 and the SIMD calculating unit 101 together 

20 with the general-purpose processor, as can be seen from Fig. 
8. 

The image processing device constructed as above can 
support various encoding methods because the processor 105 
decodes a program used for controlling the SIMD calculating unit 

25 101, the VLC processing unit 102, and the external data 

interface 103 , which has been read out of the instruction memory 
104, and the image processing device therefore performs 
programmed control of the SIMD calculating unit 101, the VLC 
processing unit 102, and the external data interface 103, 

30 While a prior art image processing device includes a DCT 
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unit and an IDCT unit disposed separately, the image processing 
device of the present embodiment implements DCT processing and 
IDCT processing by using only the SIMD calculating unit 101 
because both the DCT processing and the IDCT processing are not 
5 carried out at the same time, thus reducing the amount of 
hardware • 

In addition, while when the prior art image processing 
device performs motion compensation, a motion compensation unit, 
a motion prediction unit A, and a motion prediction unit B of 

10 the prior art image processing device can operate at the same 
time, the SIMD calculating unit 101 of the image processing 
device of the present embodiment can perform motion 
compensation at a high speed even though the SIMD calculating 
unit 101 is a single block because the SIMD calculating unit 

15 101 can process image data in parallel. 

An adaptive video signal processor disclosed in Japanese 
patent application publication No. 6-292178 and a programmable 
processor disclosed in Japanese patent application publication 
No. 8-50575 are conventional technologies that relate to the 

20 present invention. However, neither of them includes any unit 
which corresponds to the VLC processing unit 102 according to 
the first embodiment. Since in the image processing device 
according to the present embodiment the SIMD calculating unit 
101 and the VLC processing unit 102 can operate in parallel, 

25 the image processing device can implement image processing 
efficiently with a fewer number of clock cycles. 

As mentioned above, in accordance with the first 
embodiment of the present invention, the image processing 
device includes the SIMD calculating unit 101 for performing 

30 operations, such as motion compensation, motion prediction, DCT 
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processing, IDCT processing, quantization, and reverse 
quantization, and the VLC processing unit 102 for performing 
variable-length encoding processing and variable-length 
decoding processing according to a given encoding method • The 
5 image processing device of the first embodiment can thus support 
various encoding methods, and can reduce the number of clock 
cycles required for image processing. 

Embodiment 2 . 

10 An image processing device according to a second 

embodiment of the present invention includes a RAM (Random 
Access Memory) into which instructions can be downloaded from 
outside the image processing device as the instruction memory 
104 shown in Fig. 1 . The other structure of the image processing 

15 device according to the second embodiment is the same as that 
of the image processing device according to the first embodiment. 
The image processing device according to the second embodiment 
operates in the same way that the image processing device 
according to the first embodiment does, with the exception that 

20 instructions are downloaded into the RAM. 

As mentioned above, in accordance with the second 
embodiment of the present invention, since the image processing 
device includes the RAM into which instructions can be 
downloaded from outside the image processing device, the image 

25 processing device can support various encoding methods with the 
single LSI. 
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Embodiment 3 . 

An image processing device according to a third 
embodiment of the present invention includes a low-cost 
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small-size ROM (Read Only Memory) as the instruction memory 104 
shown in Fig. !• The other structure of the image processing 
device according to the third embodiment is the same as that 
of the image processing device according to the first embodiment. 
5 The image processing device according to the third embodiment 
operates in the same way that the image processing device 
according to the first embodiment does. 

As mentioned above, in accordance with the third 
embodiment of the present invention, since the image processing 

10 device includes the ROM, the area of the LSI can be reduced and 
the cost of the image processing device can be reduced. 

In the above-mentioned embodiments, coding processing is 
described as an example of the operation of the image processing 
device. However, the present invention is not limited to the 

15 image processing device for performing coding processing, and 
the image processing device of the present invention can also 
perform decoding processing. 

In the above-mentioned first embodiment, DCT processing 
is illustrated as an example of the operation of the SIMD 

20 calculating unit 101. However, it is needless to say that the 
SIMD calculating unit 101 can carry out processing such as 
motion prediction, IDCT processing, quantization, reverse- 
quantization, or a filter generation, by means of the 
adder/subtracter 351, the multiplier 352, the difference 

25 calculating unit 3 53, the accumulator 3 54, the 

shifting/rounding unit 355 , and the clipping unit 356 . In other 
words, the SIMD calculating unit 101 according to the present 
invention is not limited to the one for only performing DCT 
processing. 

30 Many widely different embodiments of the present 
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invention may be constructed without departing from the spirit 
and scope of the present invention. It should be understood 
that the present invention is not limited to the specific 
embodiments described in the specification, except as defined 
in the appended claims . 



